Unicode Plane
   HOME

TheInfoList



OR:

In the Unicode standard, a plane is a continuous group of 65,536 (216) code points. There are 17 planes, identified by the numbers 0 to 16, which corresponds with the possible values 00–1016 of the first two positions in six position
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
format (U+''hhhhhh''). Plane 0 is the Basic Multilingual Plane (BMP), which contains most commonly used characters. The higher planes 1 through 16 are called "supplementary planes". The last code point in Unicode is the last code point in plane 16, U+10FFFF. As of Unicode version , five of the planes have assigned code points (characters), and seven are named. The limit of 17 planes is due to UTF-16, which can encode 220 code points (16 planes) as pairs of words, plus the BMP as a single word. UTF-8 was designed with a much larger limit of 231 (2,147,483,648) code points (32,768 planes), and would still be able to encode 221 (2,097,152) code points (32 planes) even under the current limit of 4 bytes. The 17 planes can accommodate 1,114,112 code points. Of these, 2,048 are
surrogates ''Surrogates'' is a 2009 American science fiction action film based on the 2005–2006 comic book series '' The Surrogates''. Directed by Jonathan Mostow, it stars Bruce Willis as Tom Greer, an FBI agent who ventures out into the real world to ...
(used to make the pairs in UTF-16), 66 are non-characters, and 137,468 are reserved for private use, leaving 974,530 for public assignment. Planes are further subdivided into Unicode blocks, which, unlike planes, do not have a fixed size. The 327 blocks defined in Unicode cover 26% of the possible code point space, and range in size from a minimum of 16 code points (sixteen blocks) to a maximum of 65,536 code points (Supplementary Private Use Area-A and -B, which constitute the entirety of planes 15 and 16). For future usage, ranges of characters have been tentatively mapped out for most known current and ancient writing systems.


Overview


Assigned characters


Basic Multilingual Plane

The first plane, plane 0, the Basic Multilingual Plane (BMP) contains characters for almost all modern languages, and a large number of
symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
. A primary objective for the BMP is to support the unification of prior character sets as well as characters for writing. Most of the assigned code points in the BMP are used to encode Chinese, Japanese, and Korean ( CJK) characters. The High Surrogate (U+D800–U+DBFF) and Low Surrogate (U+DC00–U+DFFF) codes are reserved for encoding non-BMP characters in UTF-16 by using a ''pair'' of 16- bit codes: one High Surrogate and one Low Surrogate. A single surrogate code point will never be assigned a character. 65,520 of the 65,536 code points in this plane have been allocated to a Unicode block, leaving just 16 code points in a single unallocated range (2FE0..2FEF). , the BMP comprises the following 164 blocks: * Basic Latin (Lower half of
ISO/IEC 8859-1 ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
: ISO/IEC 646:1991-IRV aka ASCII) (0000–007F) * Latin-1 Supplement (Upper half of
ISO/IEC 8859-1 ISO/IEC 8859-1:1998, ''Information technology — 8-bit single-byte coded graphic character sets — Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in ...
) (0080–00FF) * Latin Extended-A (0100–017F) * Latin Extended-B (0180–024F) * IPA Extensions (0250–02AF) * Spacing Modifier Letters (02B0–02FF) *
Combining Diacritical Marks Combining Diacritical Marks is a Unicode block containing the most common combining characters. It also contains the character "Combining Grapheme Joiner", which prevents canonical reordering of combining characters, and despite the name, actual ...
(0300–036F) * Greek and Coptic (0370–03FF) *
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
(0400–04FF) *
Cyrillic Supplement Cyrillic Supplement is a Unicode block containing Cyrillic letters for writing several minority languages, including Abkhaz, Kurdish, Komi, Mordvin, Aleut The Aleuts ( ; russian: Алеуты, Aleuty) are the indigenous people of the ...
(0500–052F) * Armenian (0530–058F) * Aramaic Scripts: ** Hebrew (0590–05FF) ** Arabic (0600–06FF) ** Syriac (0700–074F) **
Arabic Supplement Arabic Supplement is a Unicode block that encodes Arabic Arabic (, ' ; , ' or ) is a Semitic language spoken primarily across the Arab world.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with ...
(0750–077F) ** Thaana (0780–07BF) **
N'Ko N'Ko () is a script devised by Solomana Kante in 1949, as a modern writing system for the Mandé languages of West Africa. The term ''N'Ko'', which means ''I say'' in all Mandé languages, is also used for the Mandé literary standard writt ...
(07C0–07FF) ** Samaritan (0800–083F) **
Mandaic Mandaic may refer to: * Mandaic language * Mandaic alphabet ** Mandaic (Unicode block) Mandaic is a Unicode block containing characters of the Mandaic script used for writing the historic Eastern Aramaic, also called Classical Mandaic, and the m ...
(0840–085F) ** Syriac Supplement (0860–086F) ** Arabic Extended-B (0870–089F) ** Arabic Extended-A (08A0–08FF) * Brahmic scripts: ** Devanagari (0900–097F) ** Bengali (0980–09FF) ** Gurmukhi (0A00–0A7F) ** Gujarati (0A80–0AFF) ** Oriya (0B00–0B7F) ** Tamil (0B80–0BFF) **
Telugu Telugu may refer to: * Telugu language, a major Dravidian language of India *Telugu people, an ethno-linguistic group of India * Telugu script, used to write the Telugu language ** Telugu (Unicode block), a block of Telugu characters in Unicode S ...
(0C00–0C7F) ** Kannada (0C80–0CFF) ** Malayalam (0D00–0D7F) ** Sinhala (0D80–0DFF) **
Thai Thai or THAI may refer to: * Of or from Thailand, a country in Southeast Asia ** Thai people, the dominant ethnic group of Thailand ** Thai language, a Tai-Kadai language spoken mainly in and around Thailand *** Thai script *** Thai (Unicode block ...
(0E00–0E7F) ** Lao (0E80–0EFF) **
Tibetan Tibetan may mean: * of, from, or related to Tibet * Tibetan people, an ethnic group * Tibetan language: ** Classical Tibetan, the classical language used also as a contemporary written standard ** Standard Tibetan, the most widely used spoken dial ...
(0F00–0FFF) **
Myanmar Myanmar, ; UK pronunciations: US pronunciations incl. . Note: Wikipedia's IPA conventions require indicating /r/ even in British English although only some British English speakers pronounce r at the end of syllables. As John C. Wells, Joh ...
(1000–109F) * Georgian (10A0–10FF) *
Hangul Jamo This is the list of Hangul ''jamo'' (Korean alphabet letters which represent consonants and vowels in Korean) including obsolete ones. This list contains Unicode code points. In the lists below, * code points in were added in Unicode 5.2.
(1100–11FF) * Ethiopic (1200–137F) *
Ethiopic Supplement Ethiopic Supplement is a Unicode block containing extra Geʽez Geez (; ' , and sometimes referred to in scholarly literature as Classical Ethiopic) is an ancient Ethiopian Semitic language. The language originates from what is now north ...
(1380–139F) * Cherokee (13A0–13FF) *
Unified Canadian Aboriginal Syllabics Canadian syllabic writing, or simply syllabics, is a family of writing systems used in a number of Indigenous Canadian languages of the Algonquian, Inuit, and (formerly) Athabaskan language families. These languages had no formal writing sy ...
(1400–167F) * Ogham (1680–169F) * Runic (16A0–16FF) * Philippine scripts: **
Tagalog Tagalog may refer to: Language * Tagalog language, a language spoken in the Philippines ** Old Tagalog, an archaic form of the language ** Batangas Tagalog, a dialect of the language * Tagalog script, the writing system historically used for Tagal ...
(1700–171F) ** Hanunoo (1720–173F) ** Buhid (1740–175F) **
Tagbanwa The Tagbanwa people ( Tagbanwa: ) are one of the oldest ethnic groups in the Philippines, and can be mainly found in the central and northern Palawan. Research has shown that the Tagbanwa are possible descendants of the Tabon Man, thus making th ...
(1760–177F) * Khmer (1780–17FF) * Mongolian (1800–18AF) * Unified Canadian Aboriginal Syllabics Extended (18B0–18FF) * Brahmic scripts: **
Limbu Limbu may refer to: * Limbu people, an indigenous tribe living in Nepal, Sikkim and Bhutan ** Rambahadur Limbu (born 1939), Nepalese Gurkha recipient of the Victoria Cross * Limbu language * Limbu script ** Limbu (Unicode block) Limbu is a Unicod ...
(1900–194F) * Tai scripts: ** Tai Le (1950–197F) ** New Tai Lue (1980–19DF) **
Khmer Symbols Khmer Symbols is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purpose ...
(19E0–19FF) ** Buginese (1A00–1A1F) **
Tai Tham Tai Tham script ('' Tham'' meaning "scripture") is the name given to an abugida writing system used mainly for a group of Southwestern Tai languages i.e., Northern Thai, Tai Lü, Khün and Lao; as well as the liturgical languages of Buddhism ...
(1A20–1AAF) * Combining Diacritical Marks Extended (1AB0–1AFF) * Indonesian scripts: ** Balinese (1B00–1B7F) **
Sundanese Sundanese may refer to: * Sundanese people * Sundanese language * Sundanese script Standard Sundanese script (''Aksara Sunda Baku'', ) is a writing system which is used by the Sundanese people. It is built based on Old Sundanese script (' ...
(1B80–1BBF) **
Batak Batak is a collective term used to identify a number of closely related Austronesian ethnic groups predominantly found in North Sumatra, Indonesia, who speak Batak languages. The term is used to include the Karo, Pakpak, Simalungun, Toba, ...
(1BC0–1BFF) * Lepcha (1C00–1C4F) *
Ol Chiki The Ol Chiki () script, also known as Ol Chemetʼ (Santali: ''ol'' 'writing', ''chemet'' 'learning'), Ol Ciki, Ol, and sometimes as the Santali alphabet invented by Pandit Raghunath Murmu in the year 1925, is the official writing system for San ...
(1C50–1C7F) *
Cyrillic Extended-C Cyrillic Extended-C is a Unicode block containing Cyrillic characters for facsimile reprinting Old Believer Old Believers or Old Ritualists, ''starovery'' or ''staroobryadtsy'' are Eastern Orthodox Christians who maintain the liturgical and r ...
(1C80–1C8F) *
Georgian Extended Georgian Extended is a Unicode block containing Georgian ''Mtavruli'' ( ka, მთავრული, "title" or "heading") letters that function as uppercase versions of their ''Mkhedruli The Georgian scripts are the three writing systems us ...
(1C90–1CBF) *
Sundanese Supplement Sundanese Supplement is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation p ...
(1CC0–1CCF) *
Vedic Extensions Vedic Extensions is a Unicode block containing characters for representing tones and other vedic symbols in Devanagari and other Indic scripts. Related symbols (also used in many scripts to represent vedic accents) are defined in two other blocks ...
(1CD0–1CFF) * Latin supplements: ** Phonetic Extensions (1D00–1D7F) ** Phonetic Extensions Supplement (1D80–1DBF) ** Combining Diacritical Marks Supplement (1DC0–1DFF) ** Latin Extended Additional (1E00–1EFF) *
Greek Extended Greek Extended is a Unicode block containing the accented vowels necessary for writing polytonic Greek. The regular, unaccented Greek characters as well as the characters with tonos and diaeresis can be found in the Greek and Coptic block. ...
(1F00–1FFF) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
: ** General Punctuation (2000–206F) ** Superscripts and Subscripts (2070–209F) ** Currency Symbols (20A0–20CF) ** Combining Diacritical Marks for Symbols (20D0–20FF) ** Letterlike Symbols (2100–214F) **
Number Forms Number Forms is a Unicode block containing Unicode compatibility characters that have specific meaning as numbers, but are constructed from other characters. They consist primarily of vulgar fractions and Roman numerals. In addition to the cha ...
(2150–218F) ** Arrows (2190–21FF) ** Mathematical Operators (2200–22FF) ** Miscellaneous Technical (2300–23FF) ** Control Pictures (2400–243F) ** Optical Character Recognition (2440–245F) ** Enclosed Alphanumerics (2460–24FF) **
Box Drawing Box Drawing is a Unicode block containing characters for compatibility with legacy graphics standards that contained characters for making bordered charts and tables, i.e. box-drawing characters. Its block name in Unicode 1.0 was Form and Chart C ...
(2500–257F) ** Block Elements (2580–259F) ** Geometric Shapes (25A0–25FF) ** Miscellaneous Symbols (2600–26FF) ** Dingbats (2700–27BF) ** Miscellaneous Mathematical Symbols-A (27C0–27EF) **
Supplemental Arrows-A Supplemental Arrows-A is a Unicode block containing various arrow symbols. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Supplemental Arrows-A block: See also ...
(27F0–27FF) ** Braille Patterns (2800–28FF) **
Supplemental Arrows-B Supplemental Arrows-B is a Unicode block containing miscellaneous arrows, arrow tails, crossing arrows used in knot descriptions, curved arrows, and harpoons. Block Emoji The Supplemental Arrows-B block contains two emoji: U+2934–U+2935. ...
(2900–297F) ** Miscellaneous Mathematical Symbols-B (2980–29FF) ** Supplemental Mathematical Operators (2A00–2AFF) ** Miscellaneous Symbols and Arrows (2B00–2BFF) * Glagolitic (2C00–2C5F) * Latin Extended-C (2C60–2C7F) *
Coptic Coptic may refer to: Afro-Asia * Copts, an ethnoreligious group mainly in the area of modern Egypt but also in Sudan and Libya * Coptic language, a Northern Afro-Asiatic language spoken in Egypt until at least the 17th century * Coptic alphabet ...
(2C80–2CFF) * Georgian Supplement (2D00–2D2F) * Tifinagh (2D30–2D7F) *
Ethiopic Extended Ethiopic Extended is a Unicode block containing Geʽez Geez (; ' , and sometimes referred to in scholarly literature as Classical Ethiopic) is an ancient Ethiopian Semitic language. The language originates from what is now northern Ethi ...
(2D80–2DDF) *
Cyrillic Extended-A Cyrillic Extended-A is a Unicode block containing combining Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 ...
(2DE0–2DFF) *
Supplemental Punctuation Supplemental Punctuation is a Unicode block containing historic and specialized punctuation characters, including biblical editorial symbols, ancient Greek punctuation, and German dictionary marks. Additional punctuation characters are in the Ge ...
(2E00–2E7F) * CJK scripts and symbols: **
CJK Radicals Supplement CJK Radicals Supplement is a Unicode block containing alternative, often positional, forms of the Kangxi radicals The 214 Kangxi radicals (), also known as the Zihui radicals, form a system of radicals () of Chinese characters. The radicals ...
(2E80–2EFF) **
Kangxi Radicals The 214 Kangxi radicals (), also known as the Zihui radicals, form a system of radicals () of Chinese characters. The radicals are numbered in stroke count order. They are the most popular system of radicals for dictionaries that order Traditi ...
(2F00–2FDF) ** Ideographic Description Characters (2FF0–2FFF) ** CJK Symbols and Punctuation (3000–303F) ** Hiragana (3040–309F) ** Katakana (30A0–30FF) **
Bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...
(3100–312F) ** Hangul Compatibility Jamo (3130–318F) ** Kanbun (3190–319F) **
Bopomofo Extended Bopomofo Extended is a Unicode block containing additional Bopomofo characters for writing phonetic Min Nan, Hakka Chinese, Cantonese, Hmu, and Ge. The basic set of Bopomofo characters can be found in the Bopomofo block. Block History The ...
(31A0–31BF) ** CJK Strokes (31C0–31EF) ** Katakana Phonetic Extensions (31F0–31FF) **
Enclosed CJK Letters and Months Enclosed CJK Letters and Months is a Unicode block containing circled and parenthesized Katakana, Hangul, and CJK ideographs. Also included in the block are miscellaneous glyphs that would more likely fit in CJK Compatibility or Enclosed ...
(3200–32FF) ** CJK Compatibility (3300–33FF) **
CJK Unified Ideographs Extension A CJK Unified Ideographs Extension-A is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and doc ...
(3400–4DBF) ** Yijing Hexagram Symbols (4DC0–4DFF) ** CJK Unified Ideographs (4E00–9FFF) *
Yi Syllables Yi Syllables is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. ...
(A000–A48F) *
Yi Radicals Yi Radicals is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation purposes. ...
(A490–A4CF) * Lisu (A4D0–A4FF) * Vai (A500–A63F) *
Cyrillic Extended-B Cyrillic Extended-B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation pu ...
(A640–A69F) *
Bamum Bamum, also spelled Bamoum, Bamun, or Bamoun, may refer to: *The Bamum people *The Bamum kingdom *The Bamum language *The Bamum script ** Bamum (Unicode block) * Bamum Scripts and Archives Project {{Disambig Language and nationality disambiguation ...
(A6A0–A6FF) * Modifier Tone Letters (A700–A71F) * Latin Extended-D (A720–A7FF) * Brahmic scripts: **
Syloti Nagri Sylheti Nagri or Sylheti Nagari ( syl, , ISO: , ), known in classical manuscripts as Sylhet Nagri (, ''Sileṭ Nagri'') amongst many other names (see below), was an Indic script used to write the Sylheti language and Eastern Bengali languages. ...
(A800–A82F) **
Common Indic Number Forms Common Indic Number Forms is a Unicode block containing characters for representing fractions in north India, Pakistan, and Nepal. History The following Unicode-related documents record the purpose and process of defining specific characters i ...
(A830–A83F) ** Phags-pa (A840–A87F) ** Saurashtra (A880–A8DF) ** Devanagari Extended (A8E0–A8FF) ** Kayah Li (A900–A92F) ** Rejang (A930–A95F) *
Hangul Jamo Extended-A Hangul Jamo Extended-A is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentati ...
(A960–A97F) * Brahmic scripts: ** Javanese (A980–A9DF) ** Myanmar Extended-B (A9E0–A9FF) ** Cham (AA00–AA5F) ** Myanmar Extended-A (AA60–AA7F) **
Tai Viet The Tai Viet script (Tai Dam: ("Tai script"), Vietnamese: Chữ Thái Việt) ( th, อักษรไทดำ, ) is a Brahmic script used by the Tai Dam people and various other Thai people in Vietnam and Thailand.Meetei Mayek Extensions (AAE0–AAFF) *
Ethiopic Extended-A Ethiopic Extended-A is a Unicode block containing Geʽez Geez (; ' , and sometimes referred to in scholarly literature as Classical Ethiopic) is an ancient Ethiopian Semitic language. The language originates from what is now northern Et ...
(AB00–AB2F) * Latin Extended-E (AB30–AB6F) * Cherokee Supplement (AB70–ABBF) * Meetei Mayek (ABC0–ABFF) * Hangul Syllables (AC00–D7AF) *
Hangul Jamo Extended-B Hangul Jamo Extended-B is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation ...
(D7B0–D7FF) *
Surrogates ''Surrogates'' is a 2009 American science fiction action film based on the 2005–2006 comic book series '' The Surrogates''. Directed by Jonathan Mostow, it stars Bruce Willis as Tom Greer, an FBI agent who ventures out into the real world to ...
: ** High Surrogates (D800–DB7F) **
High Private Use Surrogates The Unicode Consortium and the ISO/IEC JTC 1/SC 2/Working group, WG 2 jointly collaborate on the list of the characters in the Universal Coded Character Set. The Universal Coded Character Set, most commonly called the Universal Character Set ...
(DB80–DBFF) ** Low Surrogates (DC00–DFFF) * Private Use Area (E000–F8FF) *
CJK Compatibility Ideographs CJK Compatibility Ideographs is a Unicode block created to contain Han characters that were encoded in multiple locations in other established character encodings, in addition to their CJK Unified Ideographs assignments, in order to retain roun ...
(F900–FAFF) * Alphabetic Presentation Forms (FB00–FB4F) * Arabic Presentation Forms-A (FB50–FDFF) * Variation Selectors (FE00–FE0F) * Vertical Forms (FE10–FE1F) *
Combining Half Marks Combining Half Marks is a Unicode block containing diacritical combining characters for spanning multiple characters. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the C ...
(FE20–FE2F) * CJK Compatibility Forms (FE30–FE4F) * Small Form Variants (FE50–FE6F) * Arabic Presentation Forms-B (FE70–FEFF) * Halfwidth and Fullwidth Forms (FF00–FFEF) * Specials (FFF0–FFFF)


Supplementary Multilingual Plane

Plane 1, the Supplementary Multilingual Plane (SMP), contains historic scripts (except CJK ideographic), and symbols and notation used within certain fields. Scripts include
Linear B Linear B was a syllabic script used for writing in Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1400 BC. It is descended from ...
, Egyptian hieroglyphs, and cuneiform scripts. It also includes English reform orthographies like Shavian and Deseret, and some modern scripts like
Osage The Osage Nation, a Native American tribe in the United States, is the source of most other terms containing the word "osage". Osage can also refer to: * Osage language, a Dhaegin language traditionally spoken by the Osage Nation * Osage (Unicode b ...
, Warang Citi, Adlam, Wancho and
Toto Toto may refer to: Arts and entertainment Fictional characters Pets * Toto (Oz), Toto (''Oz''), a dog in the novel and film ''The Wonderful Wizard of Oz'' * Toto, in Japanese ''The Cat Returns#Plot, The Cat Returns'' Characters of agency * a ...
. Symbols and notations include historic and modern
musical notation Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation fo ...
; mathematical alphanumerics; shorthands;
Emoji An emoji ( ; plural emoji or emojis) is a pictogram, logogram, ideogram or smiley embedded in text and used in electronic messages and web pages. The primary function of emoji is to fill in emotional cues otherwise missing from typed conversat ...
and other pictographic sets; and game symbols for playing cards,
mahjong Mahjong or mah-jongg (English pronunciation: ) is a tile-based game that was developed in the 19th century in China and has spread throughout the world since the early 20th century. It is commonly played by four players (with some three-play ...
, and dominoes. , the SMP comprises the following 151 blocks: *
Archaic Greek Archaic Greece was the period in Greek history lasting from circa 800 BC to the second Persian invasion of Greece in 480 BC, following the Greek Dark Ages and succeeded by the Classical period. In the archaic period, Greeks settled across the M ...
and Other Left-to-right scripts: ** Linear B Syllabary (10000–1007F) **
Linear B Ideograms Linear B Ideograms is a Unicode block containing ideographic characters for writing Mycenaean Greek. Several Linear B ideographs double as syllabic letters, and are encoded in the Linear B Syllabary Linear B Syllabary is a Unicode block cont ...
(10080–100FF) ** Aegean Numbers (10100–1013F) ** Ancient Greek Numbers (10140–1018F) **
Ancient Symbols Ancient Symbols is a Unicode block containing Roman characters for currency, weights, and measures. Block History The following Unicode-related documents record the purpose and process of defining specific characters in the Ancient Symbols blo ...
(10190–101CF) ** Phaistos Disc (101D0–101FF) ** Lycian (10280–1029F) ** Carian (102A0–102DF) ** Coptic Epact Numbers (102E0–102FF) ** Old Italic (10300–1032F) **
Gothic Gothic or Gothics may refer to: People and languages *Goths or Gothic people, the ethnonym of a group of East Germanic tribes **Gothic language, an extinct East Germanic language spoken by the Goths **Crimean Gothic, the Gothic language spoken b ...
(10330–1034F) ** Old Permic (10350–1037F) ** Ugaritic (10380–1039F) **
Old Persian Old Persian is one of the two directly attested Old Iranian languages (the other being Avestan language, Avestan) and is the ancestor of Middle Persian (the language of Sasanian Empire). Like other Old Iranian languages, it was known to its native ...
(103A0–103DF) ** Deseret (10400–1044F) ** Shavian (10450–1047F) ** Osmanya (10480–104AF) **
Osage The Osage Nation, a Native American tribe in the United States, is the source of most other terms containing the word "osage". Osage can also refer to: * Osage language, a Dhaegin language traditionally spoken by the Osage Nation * Osage (Unicode b ...
(104B0–104FF) ** Elbasan (10500–1052F) ** Caucasian Albanian (10530–1056F) ** Vithkuqi (10570–105BF) ** Linear A (10600–1077F) ** Latin Extended-F (10780–107BF) * Right-to-left scripts: ** Cypriot Syllabary (10800–1083F) ** Imperial Aramaic (10840–1085F) ** Palmyrene (10860–1087F) ** Nabataean (10880–108AF) ** Hatran (108E0–108FF) ** Phoenician (10900–1091F) ** Lydian (10920–1093F) **
Meroitic Hieroglyphs The Meroitic script consists of two alphasyllabic scripts developed to write the Meroitic language at the beginning of the Meroitic Period (3rd century BC) of the Kingdom of Kush. The two scripts are Meroitic Cursive, derived from Demotic Egypt ...
(10980–1099F) **
Meroitic Cursive The Meroitic script consists of two alphasyllabic scripts developed to write the Meroitic language at the beginning of the Meroitic Period (3rd century BC) of the Kingdom of Kush. The two scripts are Meroitic Cursive, derived from Demotic Egyp ...
(109A0–109FF) ** Kharoshthi (10A00–10A5F) ** Old South Arabian (10A60–10A7F) **
Old North Arabian Ancient North Arabian (ANA)http://e-learning.tsu.ge/pluginfile.php/5868/mod_resource/content/0/dzveli_armosavluri_enebi_-ugarituli_punikuri_arameuli_ebrauli_arabuli.pdf is a collection of writing system, scripts and possibly a language or family ...
(10A80–10A9F) **
Manichaean Manichaeism (; in New Persian ; ) is a former major religionR. van den Broek, Wouter J. Hanegraaff ''Gnosis and Hermeticism from Antiquity to Modern Times''SUNY Press, 1998 p. 37 founded in the 3rd century AD by the Parthian Empire, Parthian ...
(10AC0–10AFF) **
Avestan Avestan (), or historically Zend, is an umbrella term for two Old Iranian languages: Old Avestan (spoken in the 2nd millennium BCE) and Younger Avestan (spoken in the 1st millennium BCE). They are known only from their conjoined use as the scrip ...
(10B00–10B3F) ** Inscriptional Parthian (10B40–10B5F) ** Inscriptional Pahlavi (10B60–10B7F) ** Psalter Pahlavi (10B80–10BAF) ** Old Turkic (10C00–10C4F) ** Old Hungarian (10C80–10CFF) ** Hanifi Rohingya (10D00–10D3F) **
Rumi Numeral Symbols Rumi Numeral Symbols is a Unicode block containing numeric characters used in Fez, Morocco Fez or Fes (; ar, فاس, fās; zgh, ⴼⵉⵣⴰⵣ, fizaz; french: Fès) is a city in northern inland Morocco and the capital of the Fès-Meknè ...
(10E60–10E7F) ** Yezidi (10E80–10EBF) **
Arabic Extended-C Arabic Extended-C is a Unicode block encoding Qur'anic marks used in Turkey Turkey ( tr, Türkiye ), officially the Republic of Türkiye ( tr, Türkiye Cumhuriyeti, links=no ), is a list of transcontinental countries, transcontinental co ...
(10EC0–10EFF) ** Old Sogdian (10F00–10F2F) ** Sogdian (10F30–10F6F) **
Old Uyghur Old Uyghur () was a Turkic language which was spoken in Qocho from the 9th–14th centuries and in Gansu. History The Old Uyghur language evolved from Old Turkic after the Uyghur Khaganate broke up and remnants of it migrated to Turfan, Qomu ...
(10F70–10FAF) ** Chorasmian (10FB0–10FDF) **
Elymaic The Elymaic alphabet is a right-to-left, non-joining abjad. It is derived from the Aramaic alphabet. Elymaic was used in the ancient state of Elymais, which was a semi-independent state of the 2nd century BCE to the early 3rd century CE, frequentl ...
(10FE0–10FFF) * Brahmic scripts: ** Brahmi (11000–1107F) ** Kaithi (11080–110CF) ** Sora Sompeng (110D0–110FF) **
Chakma Chakma may refer to: *Chakma people, a Tibeto-Burman people of Bangladesh and Northeast India *Chakma language, the Indo-Aryan language spoken by them **Chakma script ***Chakma (Unicode block) Chakma is a Unicode block containing characters for ...
(11100–1114F) **
Mahajani Mahajani is a Laṇḍā mercantile script that was historically used in northern India for writing accounts and financial records in Marwari, Hindi and Punjabi. It is a Brahmic script and is written left-to-right. Mahajani refers to the Hin ...
(11150–1117F) ** Sharada (11180–111DF) ** Sinhala Archaic Numbers (111E0–111FF) ** Khojki (11200–1124F) ** Multani (11280–112AF) ** Khudawadi (112B0–112FF) ** Grantha (11300–1137F) ** Newa (11400–1147F) ** Tirhuta (11480–114DF) ** Siddham (11580–115FF) ** Modi (11600–1165F) ** Mongolian Supplement (11660–1167F) ** Takri (11680–116CF) **
Ahom Ahom may refer to: *Ahom people, an ethnic community in Assam * Ahom language, a language associated with the Ahom people *Ahom religion, an ethnic folk religion of Tai-Ahom people *Ahom alphabet, a script used to write the Ahom language * Ahom kin ...
(11700–1174F) ** Dogra (11800–1184F) ** Warang Citi (118A0–118FF) ** Dives Akuru (11900–1195F) ** Nandinagari (119A0–119FF) ** Zanabazar Square (11A00–11A4F) ** Soyombo (11A50–11AAF) *
Unified Canadian Aboriginal Syllabics Extended-A Unified Canadian Aboriginal Syllabics Extended-A is a Unicode block containing extensions to the Canadian syllabics contained in the Unified Canadian Aboriginal Syllabics Unicode block. The extension adds missing characters for Nattilik and hi ...
(11AB0–11ABF) * Brahmic scripts: ** Pau Cin Hau (11AC0–11AFF) ** Devanagari Extended-A (11B00–11B5F) ** Bhaiksuki (11C00–11C6F) ** Marchen (11C70–11CBF) ** Masaram Gondi (11D00–11D5F) ** Gunjala Gondi (11D60–11DAF) **
Makasar Makassar (, mak, ᨆᨀᨔᨑ, Mangkasara’, ) is the capital of the Indonesian province of South Sulawesi. It is the largest city in the region of Eastern Indonesia and the country's fifth-largest urban center after Jakarta, Surabaya, Medan ...
(11EE0–11EFF) **
Kawi Kawi may refer to: * Kawi language, oldest attested phase of the Javanese language * Kawi script, writing system used across Southeast Asia from the 8th century to around 1500 AD ::Kawi (Unicode block), the script in Unicode * Mount Kawi, a volcano ...
(11F00–11F5F) * Lisu Supplement (11FB0–11FBF) *
Tamil Supplement Tamil Supplement is a Unicode block containing Tamil Tamil may refer to: * Tamils, an ethnic group native to India and some other parts of Asia ** Sri Lankan Tamils, Tamil people native to Sri Lanka also called ilankai tamils **Tamil Malaysians ...
(11FC0–11FFF) * Cuneiform (12000–123FF) * Cuneiform Numbers and Punctuation (12400–1247F) * Early Dynastic Cuneiform (12480–1254F) * Cypro-Minoan (12F90–12FFF) * Egyptian Hieroglyphs (13000–1342F) * Egyptian Hieroglyph Format Controls (13430–1345F) * Anatolian Hieroglyphs (14400–1467F) *
Bamum Supplement Bamum Supplement is a Unicode block containing the characters of the historic stage A-F of the Bamum script, used for writing the Bamum language Bamum (Shü Pamom "language of the Bamum", or ''Shümom'' "Mum language"), also spelled Bamun or ...
(16800–16A3F) * Mro (16A40–16A6F) * Tangsa (16A70–16ACF) * Bassa Vah (16AD0–16AFF) * Pahawh Hmong (16B00–16B8F) * Medefaidrin (16E40–16E9F) * Miao (16F00–16F9F) * Ideographic Symbols and Punctuation (16FE0–16FFF) * Tangut (17000–187FF) * Tangut Components (18800–18AFF) * Khitan Small Script (18B00–18CFF) * Tangut Supplement (18D00–18D7F) * Kana Extended-B (1AFF0–1AFFF) *
Kana Supplement Kana Supplement is a Unicode block containing one archaic katakana character and 255 hentaigana (non-standard Hiragana) characters. Additional hentaigana characters are encoded in the Kana Extended-A block. Block History The following Unico ...
(1B000–1B0FF) *
Kana Extended-A Kana Extended-A is a Unicode block containing hentaigana (non-standard hiragana) and historic kana characters. Additional hentaigana characters are encoded in the Kana Supplement block. Block History The following Unicode-related documents reco ...
(1B100–1B12F) * Small Kana Extension (1B130–1B16F) * Nushu (1B170–1B2FF) * Duployan (1BC00–1BC9F) * Shorthand Format Controls (1BCA0–1BCAF) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
: **
Musical notation Music notation or musical notation is any system used to visually represent aurally perceived music played with instruments or sung by the human voice through the use of written, printed, or otherwise-produced symbols, including notation fo ...
: ***
Znamenny Musical Notation Znamenny Musical Notation is a Unicode block containing characters for Znamenny musical notation from Russia. Few fonts support this block as of 2021. Ones that do and are free for personal use include '' Symbola'' 14.0 and Slavonic' 1.00 (non-c ...
(1CF00–1CFCF) *** Byzantine Musical Symbols (1D000–1D0FF) *** Musical Symbols (1D100–1D1FF) *** Ancient Greek Musical Notation (1D200–1D24F) ** Kaktovik Numerals (1D2C0–1D2DF) **
Mayan Numerals The Mayan numeral system was the system to represent numbers and calendar dates in the Maya civilization. It was a vigesimal (base-20) positional numeral system. The numerals are made up of three symbols; zero (a shell), one (a dot) and f ...
(1D2E0–1D2FF) ** Mathematical symbols: *** Tai Xuan Jing Symbols (1D300–1D35F) *** Counting Rod Numerals (1D360–1D37F) *** Mathematical Alphanumeric Symbols (1D400–1D7FF) **
Sutton SignWriting Sutton SignWriting, or simply SignWriting, is a system of writing sign languages. It is highly featural and visually iconic, both in the shapes of the characters, which are abstract pictures of the hands, face, and body, and in their spatial arr ...
(1D800–1DAAF) * Latin Extended-G (1DF00–1DFFF) *
Glagolitic Supplement Glagolitic Supplement is a Unicode block A Unicode block is one of several contiguous ranges of numeric character codes ( code points) of the Unicode character set that are defined by the Unicode Consortium for administrative and documentation ...
(1E000–1E02F) *
Cyrillic Extended-D Cyrillic Extended-D is a Unicode block containing superscript and subscript Cyrillic characters used in Cyrillic-based phonetic transcription. The block contains the first Cyrillic characters defined outside of the Basic Multilingual Plane In t ...
(1E030–1E08F) * Nyiakeng Puachue Hmong (1E100–1E14F) *
Toto Toto may refer to: Arts and entertainment Fictional characters Pets * Toto (Oz), Toto (''Oz''), a dog in the novel and film ''The Wonderful Wizard of Oz'' * Toto, in Japanese ''The Cat Returns#Plot, The Cat Returns'' Characters of agency * a ...
(1E290–1E2BF) * Wancho (1E2C0–1E2FF) * Nag Mundari (1E4D0–1E4FF) *
Ethiopic Extended-B Ethiopic Extended-B is a Unicode block containing additional Geʽez characters for the Gurage languages The Gurage languages (Gurage: ጉራጌ), also known as Guragie, is a dialect-continuum language, which belong to the Semitic branch of ...
(1E7E0–1E7FF) *
Mende Kikakui The Mende Kikakui script is a syllabary used for writing the Mende language of Sierra Leone. History It was devised by Mohamed Turay (born ca. 1850), an Islamic scholar, at a town called Maka (Barri Chiefdom, southern Sierra Leone). One of ...
(1E800–1E8DF) * Adlam (1E900–1E95F) *
Symbols A symbol is a mark, sign, or word that indicates, signifies, or is understood as representing an idea, object, or relationship. Symbols allow people to go beyond what is known or seen by creating linkages between otherwise very different conc ...
: ** Indic Siyaq Numbers (1EC70–1ECBF) **
Ottoman Siyaq Numbers Ottoman Siyaq Numbers is a Unicode block containing a specialized subset of the Arabic script that was used for accounting in Ottoman Turkish language, Ottoman Turkish documents. Block History The following Unicode-related documents record the ...
(1ED00–1ED4F) ** Arabic Mathematical Alphabetic Symbols (1EE00–1EEFF) ** Game tiles and cards: *** Mahjong Tiles (1F000–1F02F) *** Domino Tiles (1F030–1F09F) *** Playing Cards (1F0A0–1F0FF) **
Enclosed Alphanumeric Supplement Enclosed Alphanumeric Supplement is a Unicode block consisting of Latin alphabet characters and Arabic numerals enclosed in circles, ovals or boxes, used for a variety of purposes. It is encoded in the range U+1F100–U+1F1FF in the Supplem ...
(1F100–1F1FF) ** Enclosed Ideographic Supplement (1F200–1F2FF) ** Miscellaneous Symbols and Pictographs (1F300–1F5FF) **
Emoticons An emoticon (, , rarely , ), short for "emotion icon", also known simply as an emote, is a pictorial representation of a facial expression using characters—usually punctuation marks, numbers, and letters—to express a person's feelings, m ...
(1F600–1F64F) **
Ornamental Dingbats Ornamental Dingbats is a Unicode block containing ornamental leaves, punctuation, and ampersands, quilt squares, and checkerboard patterns. It is a subset of dingbat fonts Webdings, Wingdings, and Wingdings 2. History The following Unicode- ...
(1F650–1F67F) ** Transport and Map Symbols (1F680–1F6FF) ** Alchemical Symbols (1F700–1F77F) **
Geometric Shapes Extended Geometric Shapes Extended is a Unicode block containing Webdings/Wingdings symbols, mostly different weights of squares, crosses, and saltires, and different weights of variously spoked asterisks, stars, and various color squares and circles for ...
(1F780–1F7FF) ** Supplemental Arrows-C (1F800–1F8FF) **
Supplemental Symbols and Pictographs Supplemental Symbols and Pictographs is a Unicode block containing emoji characters. It extends the set of symbols included in the Miscellaneous Symbols and Pictographs block. It also includes Typikon symbols. Emoji The Unicode 14.0 Supplementa ...
(1F900–1F9FF) ** Chess Symbols (1FA00–1FA6F) **
Symbols and Pictographs Extended-A Symbols and Pictographs Extended-A is a Unicode block containing emoji characters. It extends the set of symbols included in the Supplemental Symbols and Pictographs block. All of the characters in the Symbols and Pictographs Extended-A block a ...
(1FA70–1FAFF) ** Symbols for Legacy Computing (1FB00–1FBFF)


Supplementary Ideographic Plane

Plane 2, the Supplementary Ideographic Plane (SIP), is used for CJK Ideographs, mostly CJK Unified Ideographs, that were not included in earlier character encoding standards. , the SIP comprises the following six blocks: * CJK Unified Ideographs Extension B (20000–2A6DF) * CJK Unified Ideographs Extension C (2A700–2B73F) * CJK Unified Ideographs Extension D (2B740–2B81F) * CJK Unified Ideographs Extension E (2B820–2CEAF) * CJK Unified Ideographs Extension F (2CEB0–2EBEF) * CJK Compatibility Ideographs Supplement (2F800–2FA1F)


Tertiary Ideographic Plane

Plane 3 is the Tertiary Ideographic Plane (TIP).
CJK Unified Ideographs Extension G CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese. It is the first block to be allocated to the Tertiary Ideographic Plane. The exotic characte ...
was added to the TIP in Unicode 13.0, released in March 2020. It also is tentatively allocated for
Oracle Bone script Oracle bone script () is an ancient form of Chinese characters that were engraved on oracle bonesanimal bones or turtle plastrons used in pyromantic divination. Oracle bone script was used in the late 2nd millennium BC, and is the earliest kno ...
and
Small Seal Script The small seal script (), or Qin script (, ''Qínzhuàn''), is an archaic form of Chinese calligraphy. It was standardized and promulgated as a national standard by the government of Qin Shi Huang, the founder of the Chinese Qin dynasty. Name ...
. , the TIP comprises the following two blocks: *
CJK Unified Ideographs Extension G CJK Unified Ideographs Extension G is a Unicode block containing rare and historic CJK Unified Ideographs for Chinese, Japanese, Korean, and Vietnamese. It is the first block to be allocated to the Tertiary Ideographic Plane. The exotic characte ...
(30000–3134F) * CJK Unified Ideographs Extension H (31350–323AF)


Unassigned planes

Planes 4 to 13 (planes to in
hexadecimal In mathematics and computing, the hexadecimal (also base-16 or simply hex) numeral system is a positional numeral system that represents numbers using a radix (base) of 16. Unlike the decimal system representing numbers using 10 symbols, hexa ...
): No characters have yet been assigned, or proposed for assignment, to Planes 4 through 13.


Supplementary Special-purpose Plane

Plane 14 ( in hexadecimal) is designated as the Supplementary Special-purpose Plane (SSP). It comprises the following two blocks, : * Tags (E0000–E007F) * Variation Selectors Supplement (E0100–E01EF) – used to indicate alternate glyphs for characters.


Private Use Area Planes

The two planes 15 and 16 (planes and in hexadecimal) each contain a " Private Use Area". They contain blocks named Supplementary Private Use Area-A (PUA-A) and -B (PUA-B). The Private Use Areas are available for use by parties outside ISO and Unicode (private character encoding).


References

{{Unicode navigation Plane